In [1]:
# If you have any problem with NLTK restart your kernel and run this:
# import nltk
# nltk.download('stopwords')
# nltk.download('wordnet')

from scipy.stats import spearmanr, pearsonr
from src.models.predict_emotions import *
from src.utils.plot_3D_actor_emotions import plot_3D_actor_plot_emotion, plot_3D_actor_review_emotion
from src.utils.plot_bars_actors_emotions import plot_actors_emotion_selector
from src.models.kmeans_emotional_type import get_best_k_clustered_movie_emotional_type
from src.data.actor_name_statistics import get_actors_name_and_statistics
from src.utils.plot_genres import plot_emotion_distribution
from src.utils.plot_ratings import emotion_distribution_by_movie_rating
from src.scripts.emotion_evolution import *
from src.models.predict_emotions import read_tsv_predicted_emotions
from src.scripts.emotion_transitions import *
from src.utils.plot_countries_plots import plot_world_map_emotion_by_genre, plot_world_map_average_rating
from src.utils.plot_ratings_by_dominant_emotion import plot_ratings_by_most_dominant_emotion
from src.utils.plot_top_words import generate_word_clouds_by_emotion
from src.scripts.emotion_transitions import *
from src.scripts.emotion_evolution import *
from src.utils.plots_month_trends import*
from src.utils.plots_genres import*
from src.data.normalize_emotions import *
from src.utils.plot_movie_emotional_type import *
from src.data.load_data import *

# Make sure that if any dependencies changes it will be reflected in the notebook (From the ML course)
%load_ext autoreload
%autoreload 2
All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.

All the layers of TFRobertaForSequenceClassification were initialized from the model checkpoint at j-hartmann/emotion-english-distilroberta-base.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.
All the layers of TFRobertaForSequenceClassification were initialized from the model checkpoint at j-hartmann/emotion-english-distilroberta-base.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.
[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/guillaumevitalis/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/guillaumevitalis/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
In [2]:
import plotly
plotly.offline.init_notebook_mode()

From Screen To Heart¶

How do emotions shape the cinematic landscape?¶

Scripts to create and process our datasets¶

(We do not recommand running theses functions. We ran some of them for multiple days):

Load movie metadata:

df_movies_metadata, df_movies_language, df_movies_countries, df_movies_genres = load_and_clean_movies_df()

Load movie reviews from kaggle:

df_reviews_kaggle = load_movie_reviews_kaggle(df_movies_metadata)

Load mapping IMDB id to wikipedia ID:

df_mapping_imdb_id_wikipedia_id = load_imdb_id_wikipedia_id(df_movies_metadata)

Load movie average reviews from the non-commercial IMDB:

df_imdb_average_reviews = load_imdb_average_reviews(df_mapping_imdb_id_wikipedia_id)

Predict emotions¶

Compute plot emotions:

predict_emotions_to_tsv(df_movies, column='plot', file_name='plot_emotions.tsv')

Merging df_movies with plot_emotions.tsv:

df_movies_with_emotions = merge_df_with_emotions_tsv(df_movies, file_name='plot_emotions.tsv', prefix='plot')

Compute review emotions: predict_emotions_to_tsv(df_reviews, column='review_detail', file_name='review_emotions.tsv, is_review=True')

Merging df_reviews with review_emotions.tsv:

df_reviews_with_emotions = merge_df_with_emotions_tsv(df_reviews, file_name='review_emotions.tsv', prefix='review', is_review=True)

Scrap more years and months:¶

We scrap our years and months data using:

scrap_years_months_movies(df_movies_metadata)

Then we load the resulting csv and combine both data (from both the original dataset and scrapped data):

save_final_dates(get_final_dates(load_scrapped_dates()))

Scrap more reviews:¶

scrape_reviews(df_mapping_imdb_id_wikipedia_id)

Scripts to normalize emotion scores¶

We normalize the plot emotion scores using:

df_movies_with_emotions_normalized = normalize_total_plot_emotions(df_movies_with_emotions, with_neutral=False)

We normalize the review emotion scores using:

df_reviews_with_emotions_normalized = normalize_review_emotions(df_reviews_with_emotions, with_neutral=False)

Let's first load our data:¶

In [3]:
df_movies, df_language, df_countries, df_genres, df_reviews = load_final_movies_and_reviews()

df_movies_with_emotions = merge_df_with_emotions_tsv(
    df_movies, 
    file_name='plot_emotions.tsv', 
    prefix='plot'
)

df_movies_with_emotions_normalized = normalize_total_plot_emotions(
    df_movies_with_emotions, 
    with_neutral=False
)

df_reviews_with_emotions = merge_df_with_emotions_tsv(
    df_reviews, 
    file_name='review_emotions.tsv', 
    prefix='review', 
    is_review=True)

df_reviews_with_emotions_normalized = normalize_review_emotions(
    df_reviews_with_emotions, 
    with_neutral=False
)
df_characters = load_character_metadata()[["wikipedia_ID", "actor_name", "freebase_ID_actor"]].dropna(axis=0)
df_main_genres = get_genres_merged(df_genres)

plot_emotions_df = read_tsv_predicted_emotions('plot_emotions.tsv')
emotions_split_df = split_movies_emotions_and_genres(plot_emotions_df, df_genres)
genres_list = df_main_genres.drop("wikipedia_ID", axis=1).columns
emotion_list = ["joy", "sadness", "anger", "disgust", "fear", "surprise"]
In [4]:
print("Our dataset has {} different movies".format(len(df_movies_with_emotions_normalized)))
print("Without combining any genres, we have {} different genres".format(df_genres.shape[1]))
print("We also have {} different countries".format(df_countries.shape[1]))
print("We also have {} different reviews".format(len(df_reviews)))
Our dataset has 11564 different movies
Without combining any genres, we have 336 different genres
We also have 105 different countries
We also have 449824 different reviews

Wordclouds and word counts¶

We display the Wordsclouds and word count distribution to make sure our model is reliable and to get a better insight of the words comprised in each emotions.

In [5]:
generate_word_clouds_by_emotion(df_movies_with_emotions_normalized, show_wordclouds=True)
Top 10 words per emotion:

Emotion: Anger
  kill: 4191
  police: 3263
  father: 3156
  man: 3089
  try: 3024
  friend: 2971
  later: 2929
  make: 2926
  return: 2902
  time: 2860

Emotion: Disgust
  man: 1896
  film: 1884
  woman: 1685
  house: 1449
  time: 1389
  father: 1386
  find: 1371
  later: 1349
  home: 1318
  life: 1298

Emotion: Fear
  house: 1920
  escape: 1816
  man: 1724
  kill: 1707
  time: 1663
  begin: 1659
  return: 1556
  later: 1551
  find: 1519
  police: 1512

Emotion: Joy
  love: 897
  time: 742
  film: 724
  friend: 721
  new: 699
  life: 695
  family: 687
  father: 656
  make: 597
  home: 577

Emotion: Sadness
  life: 3329
  father: 3156
  friend: 2919
  love: 2901
  home: 2798
  time: 2735
  new: 2678
  mother: 2619
  day: 2524
  family: 2466

Emotion: Surprise
  new: 136
  time: 133
  day: 113
  life: 106
  find: 103
  end: 101
  friend: 100
  man: 91
  make: 86
  love: 86

Interpretations¶

Our interpretations for these word clouds and word counts are, respectively, for each emotion:

  • Joy: Joyful movie plots center on themes of love, relationships, and familial bonds. Words like 'love,' 'friend,' 'family,' and 'father' highlight positive connections and emotional warmth. The frequent appearance of 'new,' 'life,' and 'time' suggests an emphasis on fresh beginnings and the passage of meaningful moments. Together, these words convey a sense of happiness and fulfillment.

  • Sadness: Sad plots delve into themes of loss, longing, and introspection. Words like 'life,' 'father,' 'friend,' and 'mother' suggest a focus on family and close relationships. The prevalence of 'home,' 'love,' and 'day' underscores the significance of cherished spaces and moments. 'New' and 'time' reflect transitions and the passage of grief, while 'family' anchors the narrative in deep emotional connections.

  • Anger: Plots characterized by anger often revolve around themes of conflict, violence, and tension. Words like 'kill' and 'police' suggest elements of crime or law enforcement entangled in the narrative. 'Father,' 'friend,' and 'man' may point to strained personal relationships as central conflicts. The recurring presence of 'try,' 'make,' and 'return' conveys a sense of action, persistence, and attempts at resolution, while 'time' and 'later' imply a progression of events leading to climactic outcomes.

  • Disgust: Disgust-driven plots often explore unsettling or morally challenging scenarios. Words like 'man,' 'woman,' and 'father' point to interpersonal dynamics, often within a familial or societal context. 'House,' 'home,' and 'life' suggest a focus on domestic settings, possibly as stages for conflict or discomfort.

  • Fear: Fearful narratives center around suspense, danger, and survival. Words like 'house' and 'escape' indicate scenarios of confinement or flight, while 'kill' and 'police' evoke life-threatening stakes. 'Man,' 'find,' and 'time' suggest individuals uncovering sinister events or racing against time. Words like 'begin' and 'return' convey a cyclical nature to the story, as characters face recurring or escalating threats.

  • Surprise: Surprise-themed narratives often revolve around unexpected revelations and new beginnings. Words like 'new,' 'time,' and 'day' emphasize fresh starts and pivotal moments. The appearance of 'find,' 'end,' and 'life' suggests uncovering hidden truths or reaching transformative conclusions. 'Friend,' 'man,' and 'love' imply personal relationships as central to the surprise, while 'make' indicates characters taking action to adapt or react.

How are our movies classified?¶

In [6]:
plot_genres_proportions_sorted(df_genres, False)

As you can see there are too many genres, so we combined genres into bigger genre classes

In [7]:
plot_genres_proportions_sorted(df_main_genres, True)
print(f"Number of final genres: {df_main_genres.shape[1]}")
Number of final genres: 23

Correlation between genres and emotions¶

The first heatmap reveals how each emotion naturally aligns with specific film genres based on the narratives present in the plots.

While the second shows that the p-values confirm the strong, positive, or negative correlations observed in the heatmap (statistically significant).

In [8]:
generate_emotion_genre_heatmap(df_main_genres, df_movies_with_emotions_normalized)

Interpretations¶

For our first heatmap for each emotion we can see that:

  • Joy: Joy illuminates comedy, romance, and musicals, genres crafted to bring smiles and inspire dreams. However, in horror, thrillers, and crime genres, it disappears entirely, overshadowed by darker emotions.

  • Sadness thrives in dramas and historical films, where tears flow and sacrifices mark the spirits.

  • Anger: Anger fuels action and crime, bringing energy and intensity to explosive narratives.

  • Disgust: Disgust, true to its nature, excels in horror and experimental films, unsettling audiences with uncomfortable scenes.

  • Fear: Fear reigns supreme in horror, thrillers, action, and science fiction, dominating suspenseful and adrenaline-packed stories. Yet, in musicals and romance, it is notably absent, panic is hard to evoke in the middle of a graceful choreography or during a candlelit dinner.

  • Surprise: Surprise shines in science fiction and adventure but occasionally appears unexpectedly in teen or comedy genres, adding captivating twists. However suprise is less present in drama, war and history genres.

Now for the second heatmap we see that for the Experimental and Teen genres show no correlation with fear. After all, who would expect terror in enigmatic works or in an adolescent story centered on daily concerns? Science fiction, on the other hand, has nothing to do with anger, preferring to marvel at distant and futuristic galaxies or utopias.

Moreover Historical films lack both disgust and anger, these absences were expected, as these emotions are often felt by viewers but rarely explicitly expressed in film plots.

The propaganda genre has only a very weak but significant link with disgust, reflecting its informative and manipulative intent rather than emotional engagement.

Finally War, on the other hand, shows no correlation with sadness, anger, or disgust, instead favoring fear and surprise to convey the intensity of conflicts, even though the absent emotions are widely recognized as ones deeply felt by audiences when watching this type of film.

Global interpretation¶

This suggests that the emotions audiences commonly associate with certain genres are not always directly conveyed through the plots or film narratives but instead emerge from how viewers perceive and interpret the stories thanks to the filmmaker jobs.

Emotion distribution of movie plots across genres¶

We plot the emotion distribution of movie plots across genres to vizualize better how emotions are distributed across genres

In [9]:
plot_emotion_distribution(df_movies_with_emotions_normalized, df_main_genres, is_review=False)
plot_emotion_distribution(df_reviews_with_emotions_normalized, df_main_genres, is_review=True)
In [10]:
plot_emotion_distribution(df_movies_with_emotions_normalized, df_main_genres, is_review=False, specific_emotion="joy")
plot_emotion_distribution(df_reviews_with_emotions_normalized, df_main_genres, is_review=True, specific_emotion="joy")

Interpretation¶

In movie plots, genres like Musical, Romance, and Comedy exhibit high levels of Joy, while darker genres such as Disaster, Horror, Thriller, and Science Fiction display the lowest Joy scores. The same trend is evident in reviews, where Musical maintains the highest Joy score, and Horror, Disaster, and Thriller remain at the bottom. However, the scale differs significantly compared to plots, indicating that reviews generally express a stronger sense of Joy than plots.

In [11]:
plot_emotion_distribution(df_movies_with_emotions_normalized, df_main_genres, is_review=False, specific_emotion="sadness")
plot_emotion_distribution(df_reviews_with_emotions_normalized, df_main_genres, is_review=True, specific_emotion="sadness")

Interpretation¶

Movie plots in genres like Romance, Musical, and Drama feature higher Sadness scores, while Horror and Experimental have the lowest. In reviews, Sadness scores are relatively consistent across genres, with only a 2.27% variation between the highest and lowest scores, suggesting a more balanced emotional reaction among reviewers regardless of genre.

In [12]:
plot_emotion_distribution(df_movies_with_emotions_normalized, df_main_genres, is_review=False, specific_emotion="anger")
plot_emotion_distribution(df_reviews_with_emotions_normalized, df_main_genres, is_review=True, specific_emotion="anger")

Interpretation¶

Plots for genres with action-packed themes like Western, Crime, and Action show a significant increase in Anger scores, while Documentary plots have notably lower scores. In reviews, Propaganda stands out with the highest Anger score, possibly reflecting audience disagreement or criticism of these films. Conversely, Animated and Musical genres have the lowest Anger scores, likely due to their lighter themes and younger target audience, which reduce expressions of anger in reviews.

In [13]:
plot_emotion_distribution(df_movies_with_emotions_normalized, df_main_genres, is_review=False, specific_emotion="disgust")
plot_emotion_distribution(df_reviews_with_emotions_normalized, df_main_genres, is_review=True, specific_emotion="disgust")

Interpretation¶

Plots with the highest Disgust scores belong to Experimental, Pornographic, and Horror genres. Explicit or unsettling themes in these genres likely drive these elevated scores. Meanwhile, Musical has the lowest Disgust score, which aligns with its joyful and family-friendly nature. In reviews, Pornographic, Horror, and Propaganda genres top the Disgust scale. Explicit content likely influences the scores for the first two, while the audience's ideological disagreement may account for the high Disgust in Propaganda reviews. Musical, again, scores the lowest.

In [14]:
plot_emotion_distribution(df_movies_with_emotions_normalized, df_main_genres, is_review=False, specific_emotion="fear")
plot_emotion_distribution(df_reviews_with_emotions_normalized, df_main_genres, is_review=True, specific_emotion="fear")

Interpretation¶

The highest Fear scores in plots come from Disaster, Fantasy, and Horror genres, which often feature catastrophic events, jump scares, and elements designed to instill fear. Romance and Musical, by contrast, evoke the least fear. The same pattern is observed in reviews, suggesting a strong alignment between plot content and audience reactions for this emotion.

In [15]:
plot_emotion_distribution(df_movies_with_emotions_normalized, df_main_genres, is_review=False, specific_emotion="surprise")
plot_emotion_distribution(df_reviews_with_emotions_normalized, df_main_genres, is_review=True, specific_emotion="surprise")

Interpretation¶

Plots in Science Fiction and Teen genres have the highest Surprise scores, likely due to unexpected twists or novel concepts. The lowest Surprise scores are found in War, Historical, and Western genres, which tend to follow more predictable or stereotyped narratives. In reviews, Experimental and Science Fiction genres receive the highest Surprise scores, indicating that audiences find these genres more innovative or unpredictable. Horror, however, has the lowest Surprise score, possibly due to predictable scares.

Clustering of movies based on their mean plot emotions¶

We cluster our movies based on their emotional profile (movie plot emotions and reviews emotions) with the KMeans algorithm in order to see if we can discover interesting clusters. First, we find our optimal number of clusters by computing the silhouette score for k from 2 to 10

In [16]:
get_best_k_clustered_movie_emotional_type(normalize_total_plot_emotions(df_movies_with_emotions), False, 2, 10);
get_best_k_clustered_movie_emotional_type(normalize_review_emotions(df_reviews_with_emotions), True, 2, 10);
Silhouettes: 
       score
k           
2   0.230025
3   0.197798
4   0.201764
5   0.203953
6   0.166844
7   0.159833
8   0.161146
9   0.156791
10  0.157342
The best k is 2
Silhouettes: 
       score
k           
2   0.373393
3   0.307277
4   0.280370
5   0.241740
6   0.235454
7   0.232456
8   0.211425
9   0.195069
10  0.196879
The best k is 2

As we can see, 2 is the number that maximizes our silhouette score, so we are going to next cluster with k=2

In [17]:
plot_emotions_mean_genres, clusters_plots = plot_clustered_movie_emotional_type(normalize_total_plot_emotions(df_movies_with_emotions), df_genres, False, k=2, clusters_color = {0: "steelblue", 1: "palevioletred"}, clusters_col_subplot = {1:1, 0:2})
reviews_emotions_mean_genres, clusters_reviews = plot_clustered_movie_emotional_type(normalize_review_emotions(df_reviews_with_emotions), df_genres, True, clusters_color = {0: "palevioletred", 1: "steelblue"}, clusters_col_subplot = {1:2, 0:1})

As we can see the two clusters mean genres proportions seems similar for both the emotional profile of the movies based on its plot and the emotional profile of the movies based on their reviews.

In the plots above we swapped the the id of the clusters for the emotional profile of the movie plots to better compare them vizually.

Now we would like to know if our intuition is right. Let's compute the spearman and pearson correlation on the genre proportions

In [18]:
correlations = [(0, 1), (1,0)]
for correlation in correlations:
    plot_cluster = plot_emotions_mean_genres.drop("cluster", axis=1)[plot_emotions_mean_genres["cluster"] == correlation[0]]
    review_cluster = reviews_emotions_mean_genres.drop("cluster", axis=1)[reviews_emotions_mean_genres["cluster"] == correlation[1]]

    spearmanr_corr = spearmanr(plot_cluster.values[0], review_cluster.values[0])
    pearsonr_corr = pearsonr(plot_cluster.values[0], review_cluster.values[0])

    print(f"The spearmanr correlation for the cluster {correlation[0]} from the plot emotions clustering and the cluster {correlation[1]} from the reviews emotions clustering is {spearmanr_corr.statistic} with pvalue {spearmanr_corr.pvalue}")
    print(f"The pearsonr correlation for the cluster {correlation[0]} from the plot emotions clustering and the cluster {correlation[1]} from the reviews emotions clustering is {pearsonr_corr.statistic} with pvalue {pearsonr_corr.pvalue}")
The spearmanr correlation for the cluster 0 from the plot emotions clustering and the cluster 1 from the reviews emotions clustering is 0.9649915302089217 with pvalue 4.318827029042323e-13
The pearsonr correlation for the cluster 0 from the plot emotions clustering and the cluster 1 from the reviews emotions clustering is 0.9671644430824458 with pvalue 2.2961232121301665e-13
The spearmanr correlation for the cluster 1 from the plot emotions clustering and the cluster 0 from the reviews emotions clustering is 0.953698475437606 with pvalue 6.747954124153527e-12
The pearsonr correlation for the cluster 1 from the plot emotions clustering and the cluster 0 from the reviews emotions clustering is 0.9630338264056796 with pvalue 7.381549173428688e-13

As we can see the correlations are high and statistically significant. It is interesting to see that we seem to have two types of movies, with emotional profiles reflected in both the plots and the reviews left by the viewers.

Emotion transitions throughout movies¶

Does every genre seems to follow its own pre-defined emotional flow?

We wanted to have an idea of how emotions evolve throughout a movie plot in terms of the genre. So we constructed a transition matrix for each genre, where each line corresponds to the probability distribution of the transitions from an emotion. So each entry (i,j) corresponds to the probability that we transition to j in the next step when we are in i in the current step.

In [19]:
plot_heat_map_transitions_plotly(emotions_split_df)
In [20]:
plot_heat_map_transitions_plotly(emotions_split_df, genre="War")

Interpretation¶

The high probability of transitioning from surprise to fear, for instance, is entirely expected. Sudden events like ambushes or explosions are staples of war movies, and they almost inevitably lead to fear as characters react to the unexpected. Similarly, fear often persists, reflected in the transition from fear to fear, or gives way to sadness, as the weight of loss and devastation takes hold. Anger also plays a significant role, frequently shifting to fear or sadness. This progression captures the raw intensity of emotions in conflict, where anger injustice or betrayal can quickly dissolve into the harsh reality of the battlefield.

In [21]:
plot_heat_map_transitions_plotly(emotions_split_df, genre="Disaster")

Interpretation¶

The patterns here align beautifully with the emotional intensity and unpredictability of disaster narratives. The high transition from fear to fear makes perfect sense. In disaster films, fear is often the dominant emotion, with characters facing constant threats and uncertainties. This persistence of fear reflects the ongoing tension that keeps audiences on edge. Similarly, surprise transitions to fear are natural, as unexpected catastrophic events like earthquakes or explosions are central to the genre. The transition from sadness to fear and disgust to fear also feels logical. Sadness, often evoked by loss, merges with fear as characters confront the harsh realities of survival. Anger to disgust reflects the frustration and moral outrage that often arise in disaster films. Interestingly, the transition from sadness to joy stands out. This captures those rare but powerful moments of relief or triumph. These transitions provide emotional depth and balance, giving audiences a sense of hope amid the chaos.

In [22]:
plot_heat_map_transitions_plotly(emotions_split_df, genre="Drama")

Interpretation¶

The high probability of transitioning from anger to sadness is a hallmark of dramas. Anger, often rooted in conflict or betrayal, frequently shifts to sadness as characters come to terms with loss or regret. The prominence of surprise to sadness and surprise to fear also stands out because dramas often feature unexpected revelations or events that evoke fear or sorrow, reflecting the unpredictable nature of life and relationships. Additionally, disgust transitions to sadness emphasize the moral and ethical dilemmas often explored in the genre, where disillusionment leads to deeper emotional pain. In this type of movie, characters navigate a spectrum of feelings, creating stories that resonate deeply with audiences through their authenticity and emotional truth.

In [23]:
plot_heat_map_transitions_plotly(emotions_split_df, genre="Comedy")

Interpretation¶

The transition from surprise to joy and sadness to joy stands out, reflecting how comedy often turns unexpected or gloomy moments into laughter and relief. This ability to take the audience from low to high emotions is a signature strength of the genre, creating its characteristic charm. Surprise to surprise and fear to fear highlight the genre’s use of suspense and exaggerated situations to keep audiences engaged. These transitions mirror the unexpected twists and heightened stakes that are typical in comedic storytelling. And finally, transitions between joy and sadness illustrate the emotional balance in comedy, where moments of happiness are interspersed with small dips to keep the story engaging.

In [24]:
plot_heat_map_transitions_plotly(emotions_split_df, genre="Romance")

Interpretation¶

One of the most striking transitions is from sadness to joy, which encapsulates the uplifting moments that are so integral to romance. These transitions often represent reconciliation, declarations of love, or happy endings, where sadness transforms into pure happiness. Similarly, joy to joy reflects the feel-good nature of romance films, where moments of happiness linger. Surprise to sadness and sadness to sadness highlight the dramatic turns in romance, often involving heartbreak, misunderstandings, or longing. These transitions reflect the genre’s focus on emotional vulnerability and the challenges of love. Aaah we all know this! Anger transitions to sadness and disgust transitions to sadness emphasize the emotional complexity of relationships, where conflicts and disappointments often lead to moments of reflection and emotional depth…

Sankey Diagram¶

Here, we wanted to take a closer look at how emotions transition within the films themselves, separating between beginning->middle and middle->end, using Shankey diagrams

We did our split in the following way : The first part (Beginning) corresponds to the first 1/4 of the movie, the second part (Middle) to the 1/2 that follows the beginning and the third (End) to the last 1/4 of the movie.

We only plotted certain genres to not overwhelm the notebook.

In [25]:
plot_separated_sankey_plotly(emotions_split_df)
In [26]:
plot_separated_sankey_plotly(emotions_split_df, genre="Disaster")

Interpretation¶

Fear and sadness dominate, growing through the middle and peaking toward the end, while anger dissipates after the beginning but resurfaces slightly later. The emotional intensity reflects the continuous tension found in disaster narratives, where moments of triumph highlight resilience amidst chaos.

In [27]:
plot_separated_sankey_plotly(emotions_split_df, genre="Propaganda")

Interpretation¶

Initially, disgust and anger dominate, setting a confrontational tone designed to evoke strong reactions. Fear and sadness intertwine with these, heightening tension and creating a sense of urgency. Surprise and joy appear as subtler threads, hinting at manipulation or emotional pivots. As we move from the Middle to the End, anger remains a powerful force, reflecting the sustained intensity typical of propaganda. Disgust and fear persist but become more interwoven, underscoring ongoing tension or conflict. Interestingly, joy gains prominence toward the end, suggesting resolutions or calls to action that aim to inspire or rally support. Surprise remains a dynamic element, keeping the audience engaged.

Global interpretation¶

This analysis reveals how the film industry not only uses emotions to tell stories but also to shape and define genres themselves. Each genre is uniquely distinguished, not just by its overall emotional tone, but by the transitions that weave one feeling into another. From the interplay of joy and sadness in romances to the dominance of fear and surprise in war films, these patterns highlight how emotions give genres their unique identity. Rather than serving as a mere backdrop to the narrative, emotions are wielded as a defining force, shaping genres into powerful tools that influence how we connect with films.

Moreover, the behavior of emotions within genres closely mirrors real life, which these genres are meant to illustrate. Whether it’s the comfort of joy in comedies, the suspenseful fear of thrillers, or the bittersweet sadness of dramas, these emotional dynamics reflect the complexities of human experience. This purposeful orchestration ensures that every genre offers its own distinct emotional rhythm, not only captivating audiences but also resonating with the realities they live, leaving them immersed in unforgettable cinematic experiences.

Emotion evolution and variations throughout a movie¶

Let's now dive a little bit deeper into the same analysis. Not only plot the emotion distributions or transitions, but also how each emotion proportion increases or decreases throughout a movie, and which ones vary the most. We used two types of plots : a barplot and a scatter plot. The bar plot gives the emotion distribution of each time slot whereas the scatter plot translates the emotion variations between the actual time slot and the previous one.

We only plotted certain genres to not overwhelm the notebook.

In [28]:
df_emotions_by_genre_time = construct_emotions_by_genre_and_time_df(plot_emotions_df, df_genres)
plot_bar_and_scatter_emotion_evolution(df_emotions_by_genre_time, "All Genres", all_genres=True)
In [29]:
plot_bar_and_scatter_emotion_evolution(df_emotions_by_genre_time, "Comedy", all_genres=False)

Interpretation¶

Comedy movies are emotional rollercoasters in their own way. When we’re halfway through the movie, expecting more laughs, boom! Things take a turn. That cheerful vibe from the start? It takes a descent, dropping by 31.73%. And guess what jumps in to steal the spotlight? Anger, skyrocketing by a whopping 35.59%! It’s like the characters have landed in some massive argument or chaotic mess, and we’re caught up in the drama. Honestly, the middle of the movie feels more like a rollercoaster than a comedy. But then, the ending comes around, and everything shifts again. Joy makes the ultimate comeback, soaring with a crazy +48.45% boost. It’s like all the chaos from before melts away, and the movie finally delivers that feel-good, laugh-out-loud ending we’ve been waiting for. Anger, disgust, and fear all take a backseat, fading into the background, while sadness and surprise chill out, letting joy take the spotlight.

In [30]:
plot_bar_and_scatter_emotion_evolution(df_emotions_by_genre_time, "Teen", all_genres=False)

Interpretation¶

Apparently, in the middle of a teen movie, emotions hit a boiling point. Joy takes a sharp dive (-37.07%), while anger surges (+37.59%), capturing the tension and conflicts typical of teenage life. Fear and disgust nudge upwards slightly, adding awkward and tense undertones. It’s the chaotic, emotional whirlwind that makes the teen genre so relatable. We’ve all been through these times! By the end, things calm down. Joy surges ahead with an impressive increase (+72.97%), while Surprise makes a modest comeback (+4.86%), signaling resolution and growth. Fear (-19.60%), disgust (-19.30%), and anger (-11.63%) fade, as the drama settles. The story ends up joyfully with hope and a bit of surprise, wrapping up the emotional rollercoaster with a touch of optimism. Classic teen vibes!

In [31]:
plot_bar_and_scatter_emotion_evolution(df_emotions_by_genre_time, "Propaganda", all_genres=False)

Interpretation¶

In the middle of a propaganda movie, the emotions take a clear shift. Joy takes a steep dive (-33.43%), making way for more intense emotions like anger, which rises sharply (+25.34%). Fear also sees a notable increase (+20.67%), adding a layer of tension and urgency to the narrative. Disgust edges up slightly (+5.38%), while sadness and surprise drop by -8.89% and -4.87%, respectively. This part of the movie seems to focus on heightening conflict and driving home its message with stronger, more polarizing emotions. As the film concludes, the mood stabilizes somewhat. Joy begins to recover (+13.25%), while fear (-13.32%), disgust (-17.16%), and anger (-6.95%) subside, reflecting a resolution or fulfillment of the film’s emotional arc. Surprise also picks up slightly (+12.08%), perhaps marking an impactful or thought-provoking ending. Sadness lingers with a slight rise (+2.13%), leaving a mixed emotional aftertaste. The ending wraps up with a balance of intensity and resolution.

In [32]:
plot_bar_and_scatter_emotion_evolution(df_emotions_by_genre_time, "Western", all_genres=False)

Interpretation¶

In the middle, anger variation dominates, surging by 20.68%, likely reflecting intense confrontations or conflicts. Fear also rises (+7.51%), adding tension, while joy plummets (-27.70%), shifting from fun moments to a more serious, intense vibe. Sadness and disgust show minor declines, keeping the focus on escalating tension and conflict. By the end, sadness steals the show with a big rise (+27.72%), while joy (+10.93%) and surprise (+12.04%) make small comebacks, maybe hinting at some resolution or twists. Disgust and fear fade away (-15.39% and -14.10%), wrapping up the story in classic western style.

Emotion trends over year periods¶

Each time of year has its own emotional vibe (Halloween evokes fear, New Year inspires joy, ...). To determine if the film industry aligns with these seasonal moods in their marketing strategies (using plots), we decided to compute the variation in scores for each emotion across different periods of the year.

In [33]:
periodic_emotions_diff_plot = plot_variation("plot", df_movies_with_emotions_normalized)

Interpretation¶

The month-by-month variation in plot emotions tells a fascinating story about how sentiments evolve with the year periods. Each trend seems indeed synchronized with key moments of the year, reflecting a close link between the film industry and the emotions highlighted during the year in our real life.

Moreover we see that joy shows a significant increase from December to January in the movie slots. Is this really surprising with the festive spirit of the New Year?

In February, it’s Disgust and Sadness that are climbing, no doubt due to the complex stories of Valentine’s Day, where love isn’t always a fairytale. We all know : love can be as complicated as the feelings it evokes.

When March arrives, Surprise and Joy take over, marking the arrival of spring and renewed and cheerful storylines.

In April, Fear and Anger leap forward, coinciding with Easter, a time when tense, mysterious tales seduce audiences.

In August-September, it’s Disgust that stands out, with a little sadness too, probably influenced by stories dealing with the sometimes brutal realities of the end of summer and the return to school.

Most intriguing of all is the rise of Surprise and Fear in October, just in time for Halloween. Coincidence? Certainly not, since these two emotions are known to become the mainstays of the dark and scary stories that dominate screens around this time.

In November, Thanksgiving brings Joy back to the screen, capturing the spirit of gratitude and celebration.

Finally, in December, Surprise dominates once again, with a touch of joy, brought on by the magic of Christmas and the unexpected endings that light up the season.

These trends clearly show how the film industry adjusts its stories to the different times of year. Each variation is not insignificant: it echoes the moments shared by audiences at different times of the year, confirming that cinema is an emotional mirror, perfectly tuned for each season.

We performed the same variation computations for the emotion scores, but this time using the reviews, to see if the emotional trends in reviews over the years align with those found in the plots.

In [34]:
periodic_emotions_diff_review = plot_variation("review", df_reviews_with_emotions_normalized, df_movies)

Interpretation¶

In the reviews, Fear drops over the New Year period, followed by the rise of Disgust in February, mirroring the emotions of the plots. In March, there’s a noticeable rise in joy, linked to the arrival of spring, but from here the differences with the plots then widen. During June and July, joy remains dominant, unlike the plots where more varied emotions take over. Sadness makes its appearance in August and September, probably influenced by the end of summer and the return to routine, while in October, Surprise gains ground with Halloween, as in the plots. At Thanksgiving, Sadness and Surprise mix, before Joy triumphs in December, driven by the festive spirit.

These divergences show that viewers are more likely to express their overall feelings, influenced by the season, the general atmosphere of the movie and the overall cinematic experience, rather than by the intrinsic and complex emotions of the stories.

To confirm the differences in period-to-period emotional trends between plots and reviews throughout the year, we used Pearson's correlation as the statistical method. The p-values for most emotions confirm the null hypothesis and highlight significant divergences in the variation of emotions between plots and reviews throughout the months.

In [35]:
corr_p_value_plot_periods(periodic_emotions_diff_plot, periodic_emotions_diff_review)

Interpretation¶

This graph emphasizes the differences in period-to-period emotional trends between plots and reviews throughout the year. Among all analyzed emotions, only Disgust shows a significant correlation (p = 0.020), indicating consistency between the two sources for this specific emotion. The lack of significance for other emotions reflects divergences in how viewers perceive emotions compared to those expressed in film narratives over the months.

Emotion map of movie plots¶

On the following graph, we can explore the most dominant emotion in movie plots across decades and countries.

In [36]:
plot_world_map_emotion_by_genre(df_movies_with_emotions_normalized, df_countries, is_reviews=False)

Interpretation¶

The United States of America, for instance, starts by having mostly sad movies released until 1950, then it enters into a big spiral of anger.

Even Finland, the world’s happiest country, seems to be stuck between sadness and disgust and only displays joy once. While hovering over different countries, we observe that Joy and Surprise rarely appear in the map and almost systematically are the last emotion scores for countries. If we make a little historical parallel, we can think of the 20th century as the century of the two world wars, so perhaps this is why violent or sad movies are the ones that have emerged the most. So another time, filmmakers of the time may have leaned into these themes to reflect the collective trauma and struggles, using cinema as both a mirror of harsh reality and a tool to forever etch these defining events into memory. They don’t just reflect the trends of the moment month-by-month, but adapt their stories, themes, and above all, the overarching emotions to translate the upheavals of the times and the defining events of each decade.

Emotion map of reviews¶

This time, we’ll map the most dominant emotion of the reviews across decades and countries.

In [37]:
plot_world_map_emotion_by_genre(df_reviews_with_emotions_normalized, df_countries, is_reviews=True)

Interpretation¶

We can clearly see that it’s not because a country is more represented by angry, sad, scary, or disgusted movies that the watcher considers it this way. In 1950, movies across all countries bring joy to the reviewers. And if you scroll through time, you’ll notice something remarkable, reviews around the world are generally filled with joy. Isn’t that fantastic? A universal spark of happiness, transcending borders and eras! But sometimes, emotions like disgust and fear can also dominate reviews in some countries. Take Peru in 1970 or Russia in 2010, for instance, where fear and disgust dominate both plots and reviews.

What does this tell us about the average rating of these movies per country? Is it because the most dominant emotion is joy that the average rating for countries will be high all over the world too? Let’s investigate!

Each circle size scales to the natural logarithm of the number of movies of the country, multplied by a scaling factor. This is done to avoid big differences between circles of countries with an exploding number of movies and country with fewer movies.

In [38]:
plot_world_map_average_rating(df_movies_with_emotions_normalized, df_countries)

Ratings per reviews emotion¶

We plotted a bar plot to vizualize better how are the ratings distributed depending on the dominant emotion in the reviews. We've done this because we observed in the previous maps that in general joy is prevalent as the dominant emotion in many countries. And we wanted to see if having joy as dominant emotion in a review necessary implies that the rating is good. In the process, it helps to have an idea of how the ratings change in terms of the emotions in the reviews.

In [39]:
plot_ratings_by_most_dominant_emotion(df_reviews_with_emotions)

Interpretation¶

This analysis highlights how promoted emotions influence movie ratings in unique ways. Joy, known for its universal appeal and uplifting effect, consistently resonates with audiences, leading to higher ratings overall. Joy comes first when it comes to high ratings, with a median close to 10, although it has some low-rating outliers. On the other hand, disgust, tied to discomfort and negative reactions, tends to result in lower ratings, reflecting its more repulsive nature. Disgust, as expected, comes last with a median of 5 and a third quartile of 7. These findings emphasize the powerful role emotions play in shaping the audience's perception and evaluation of films.

Emotions per rating¶

And now let's analyze the other way around! On the following graphs we can explore the emotion distribution of movie's plots and reviews accross different average movie rating bins.

In [40]:
emotion_distribution_by_movie_rating(df_movies_with_emotions_normalized, is_review=False)
emotion_distribution_by_movie_rating(df_movies_with_emotions_normalized, df_reviews_with_emotions_normalized, is_review=True)
In [41]:
emotion_distribution_by_movie_rating(df_movies_with_emotions_normalized, df_reviews_with_emotions_normalized, is_review=False, specific_emotion="joy")
emotion_distribution_by_movie_rating(df_movies_with_emotions_normalized, df_reviews_with_emotions_normalized, is_review=True, specific_emotion="joy")

Interpretation¶

For Joy, we observe an interesting trend: movies with the most joyful plots tend to fall into two categories—either very poorly rated films (1–2/10) or exceptionally well-rated ones (9–10/10). In contrast, for reviews, joy exhibits a linear increase with the movie rating, suggesting that audiences associate higher joy with better-rated films.

In [42]:
emotion_distribution_by_movie_rating(df_movies_with_emotions_normalized, df_reviews_with_emotions_normalized, is_review=False, specific_emotion="sadness")
emotion_distribution_by_movie_rating(df_movies_with_emotions_normalized, df_reviews_with_emotions_normalized, is_review=True, specific_emotion="sadness")

Interpretation¶

Regarding Sadness, its distribution in movie plots is fairly consistent, with a slight increase in films rated between 6 and 9/10. As for reviews, sadness remains relatively stable across ratings, though it noticeably drops for highly rated movies with scores above 9/10.

In [43]:
emotion_distribution_by_movie_rating(df_movies_with_emotions_normalized, df_reviews_with_emotions_normalized, is_review=False, specific_emotion="anger")
emotion_distribution_by_movie_rating(df_movies_with_emotions_normalized, df_reviews_with_emotions_normalized, is_review=True, specific_emotion="anger")

Interpretation¶

Anger shows a notable trend: highly-rated movies (>9/10) have higher levels of anger in their plots. On the other hand, reviews demonstrate an inverse relationship—movies with better ratings tend to elicit less anger in their reviews.

In [44]:
emotion_distribution_by_movie_rating(df_movies_with_emotions_normalized, df_reviews_with_emotions_normalized, is_review=False, specific_emotion="disgust")
emotion_distribution_by_movie_rating(df_movies_with_emotions_normalized, df_reviews_with_emotions_normalized, is_review=True, specific_emotion="disgust")

Interpretation¶

For Disgust, its presence in plots remains fairly stable across all rating levels. However, in reviews, a clear trend emerges: the better the movie rating, the less disgust is expressed in the reviews.

In [45]:
emotion_distribution_by_movie_rating(df_movies_with_emotions_normalized, df_reviews_with_emotions_normalized, is_review=False, specific_emotion="fear")
emotion_distribution_by_movie_rating(df_movies_with_emotions_normalized, df_reviews_with_emotions_normalized, is_review=True, specific_emotion="fear")

Interpretation¶

Fear exhibits a peak in plots of movies rated 3–4/10, while the least fearful plots are associated with movies rated 9 and above. The same trend is observed in reviews, where higher-rated movies show lower levels of fear.

In [46]:
emotion_distribution_by_movie_rating(df_movies_with_emotions_normalized, df_reviews_with_emotions_normalized, is_review=False, specific_emotion="surprise")
emotion_distribution_by_movie_rating(df_movies_with_emotions_normalized, df_reviews_with_emotions_normalized, is_review=True, specific_emotion="surprise")

Interpretation¶

For Surprise, plots reveal an inverse relationship: the better a movie is rated, the less surprise is conveyed in its plot. This intriguing observation could hint at a key to a film’s success—avoiding excessive surprise in the storyline. In reviews, however, the trend is reversed, with movies rated 9/10 or higher exhibiting higher levels of surprise compared to others.

Conclusion¶

We have shown how the movie industry masterfully leverages emotions, not only to tell stories but to define and reshape genres themselves. Each emotion, plays a distinct role in crafting unforgettable cinematic experiences. Each genre finds its unique identity through a deliberate orchestration of emotions, whether the comforting joy of comedies, the gripping fear of thrillers or the poignant sadness of dramas. Joy lights up the screen in romances and musicals, Fear dominates the suspenseful twist of thrillers. Sadness quietly thrives in dramas, Anger fuels the energy and intensity of actions and crime narratives, Disgust unsettles viewers in experimental and horror movies and Surprise keeps audiences on the edge of their seats in science fiction and adventures. The emotions weave intricate transitions, transforming genres into emotional symphonies that captivate audiences and resonate deeply with their personal experiences. It reveals how filmmakers strategically craft emotional journeys to leave lasting impressions on us. This design ensures that the emotions not only enrich the storytelling, but also create some unforgettable cinematic experiences and leave audiences immersed in the magic of the silver screen!